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Abstract: A powerful study design in the fields of genomics and meta- 
bolomics is the 'replicated time course experiment' where individual time 
series are observed for a sample of biological units, such as human patients, 
termed replicates. Standard practice for analysing these data sets is to fit 
each variable (e.g. gene transcript) independently with a functional mixed- 
effects model to account for between-replicate variance. However, such an 
independence assumption is biologically implausible given that the vari- 
ables are known to be highly correlated. 

In this article we present a skew-t-normal multi-level reduced-rank func- 
tional principal components analysis (FPCA) model for simultaneously 
modelling the between- variable and between-replicate variance. The reduced- 
rank FPCA model is computationally efficient and, analogously with a 
standard PCA for vectorial data, provides a low dimensional representa- 
tion that can be used to identify the major patterns of temporal variation. 
Using an example case study exploring the genetic response to BCG in- 
fection we demonstrate that these low dimensional representations are em- 
inently biologically interpretable. We also show using a simulation study 
that modelling all variables simultaneously greatly reduces the estimation 
error compared to the independence assumption. 



'We are grateful to Cheryl Hemingway and Timothy Ebbels for providing access to 
example data sets used in this article 
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1. Introduction 

Genomics and metabolomics are two examples of a broader range of 'omics 
domains, each of which characterises a biological organism at a different level 
of biomolecular organisation. A powerful study design within these fields is the 
'replicated time course experiment', in which a sample of biological units such 
as human patients or laboratory rats, termed 'replicates', is studied over time in 
order to infer the temporal behaviour of the population as a whole. As biological 
processes are inherently dynamic, these time series experiments provide greater 
insight than static analyses. Data arising from these experiments presents some 
unique challenges compared to more traditional time series analysis application 
domains. In particular, the time series are very short with 5 to 10 time points 
being typical. This is due to the expense involved in collecting observations, 
not purely in monetary terms but also due to ethical concerns about obtaining 
the biological samples, and the laboratory time needed to conduct the assays. 
The number of replicates are equally small. Due to the specifics of the assaying 
technologies utilised, the observations are often collected with a great deal of 
noise, and some may simply be missing. Missing data may also arise by design, 
especially when it is necessary to sacrifice replicates in animal studies in order 
to obtain the biological samples. Experimental design may also lead to the time 
points being irregularly spaced, in an attempt to exploit a priori knowledge 
about the temporal behaviour. Finally, the data is high dimensional with tens 
of thousands of variables (e.g. gene transcripts) under study simultaneously. 

In order to deal with these issues, functional data analysis (FDA) (Ramsay 
and Silverman, 2005) has become a popular modelling choice in the field of ge- 
nomics time series data analysis. In FDA, we assume that our observations are 
noisy realisations of an underlying smooth function of time (or, analogously, a 
curve) which is to be estimated. After estimation, this function is then treated 
as the fundamental unit of data in any subsequent analysis, such as cluster- 
ing or network inference. Within the field of genomics, FDA approaches have 
been proposed for clustering unreplicated data sets (Ma et al., 2006), detect- 
ing significant genes in multi-sample replicated data sets (Storey et al., 2005) 
and detecting significant genes in cross-sectional studies (Angelini, Canditiis 
and Pensky, 2009). We ourselves have demonstrated that such methodology is 
equally well-suited to the field of metabolomics (Berk, Ebbels and Montana, 
2011; Montana, Berk and Ebbels, 2011). 

Despite the seeming essentiality of replication, the replicated time course 
study design is rare, possibly due to a lack of appreciation that without repli- 
cation inference is restricted to the single sample under study alone, or perhaps 
limited by the few adequate modelling choices available. This limited range of 
statistical methodology has no doubt arisen from the complexity involved in 
simultaneously modelling the covariance between the variables and, for a given 
variable, between the replicates. This complexity is exacerbated by the high 
dimensionality of the data which incurs a significant computational cost. 

There are only a handful of approaches specifically designed for replicated 
genomics time series data sets that we are aware of. The first of these, proposed 
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by Tai and Speed (2009), does not truly account for time as a quantitative vari- 
able in the sense that the results of an analysis would be the same if the time 
points were to be permuted. Furthermore, it cannot handle missing data without 
resorting to undesirable imputation procedures. The other two approaches, the 
functional mixed-effects model of Storey et al. (2005), and the functional prin- 
cipal components analysis (FPCA) method proposed by Liu and Yang (2009) 
both opt to avoid the complexity of simultaneously modelling both levels of co- 
variance by instead modelling each variable independently. In the case of Storey 
et al. (2005), each variable is summarised with a mean curve across all repli- 
cates. The replicate effects are treated as scalar shifts from this mean curve, so 
that each replicate exhibits exactly the same temporal profile. We have previ- 
ously extended this approach to allow for more realistic heterogeneous replicate 
behaviour by treating the replicate effects themselves as curves (Berk et al., 
2010). Under this model, each variable can be summarised with a mean curve 
and covariance surface which describes the replicate heterogeneity. Similarly, 
Liu and Yang (2009) use the 'principal analysis through conditional expecta- 
tion' (PACE) method of Yao, Miiller and Wang (2005), also summarising each 
variable with a mean curve, but this time with an eigen-decomposition of the 
covariance surface. 

All of the methods mentioned above rely on the assumption of independence 
between variables. It is clear to understand the motivation for this independence 
assumption as, while it may be biologically unjustified, it greatly simplifies the 
analysis. However, given that there have been no proposed alternatives that 
do account for both levels of covariance, it has not been possible to ascertain 
the true impact of this simplification, for example in terms of estimation error. 
However, it is reasonable to assume, given the few observations available for 
each variable, that the effect is significant. 

There have been two methods proposed for accounting for multiple levels of 
covariance outside of 'omics application domains. The first of these, introduced 
by Di et al. (2009), suggests to estimate the covariance surface at each level 
using the method of moments, which is then smoothed using thin-plate spline- 
smoothing. For dimensionality reduction, the resulting surfaces are subject to 
eigen-decompositions. We remain sceptical, however, of the applicability of such 
an approach to 'omics data sets where the tiny number of time points and few 
replicates raises significant concerns as to the ability of the method of moments 
to adequately estimate the covariance surfaces. In contrast, Di et al. (2009) 
demonstrate their method on a data set with over 3, 000 replicates and 960 time 
points. 

The other proposed approach is the multi-level reduced-rank FPCA model 
of Zhou et al. (2010), extending the single-level reduced-rank FPCA model of 
James, Hastie and Sugar (2000). This is similar to the model that we introduce 
in this article, however the key difference is that they assume the principal 
component loadings at each level are normally distributed. We will demonstrate 
here that such an assumption is untenable at the variable level for 'omics data 
sets, where the small number of variables with significantly time varying profiles, 
in conjunction with the high dimensionaliy of the data, leads to distributions 
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which exhibit a high degree of kurtosis and which may be skewed. In order to 
address this issue we propose a multi-level reduced-rank FPCA model in which 
the variable level loadings follow a skew-i-normal distribution, which is a flexible 
four parameter distribution allowing for both heavy tails and skewness. 

2. Methods 

2.1. A multi-level reduced-rank FPCA model 

We assume that the observation, such as gene expression level or NMR spec- 
trum intensity, at time t on replicate j for variable z, yij{t), is described by the 
following functional mixed-effects model: 

{t) = m(0 + f^ it) + 9ri (<) + ^^ (1) 

where is the 'grand mean' across all variables and fi{t) is the offset from 
the grand mean for variable z, so that fi{t) -\- fi(t) represents the mean function 
for variable i; gij(t) is the replicate offset from the variable mean for replicate 
j, specific to variable i; ti{t) is an error term, specific to variable i. Note that if 
the replicate effect gij {t) was not variable specific then the subscript i could be 
dropped so that the same gj{t) term was shared across all variables. However, 
both intuition and real data sets support the idea of separate replicate effects 
for each variable. We would expect in a genomics experiment, for instance, that 
certain gene transcripts display a homogeneous response across all replicates 
while others are much more heterogeneous, and it could well be the case that 
these differences lead to exactly the biological effect we are seeking to identify. 
Even when two transcripts both display heterogeneity between the replicates, 
the exact nature of that variation is likely to be transcript-specific. In Supple- 
mentary Figures 1 and 2 we give raw data for two transcripts from our example 
data set that illustrate these points. 

As in James, Hastie and Sugar (2000), we can simultaneously achieve com- 
putational efficiency, parsimony and dimensionality reduction by replacing fi (t) 
and gij{t) in (1) with their Karhunen-Loeve decompositons, yielding 

oo oo 

Vijit) = fJ.{t) + ^Ck{t)aik + ^mi{t)f3iji -I- eij{t) 

k=l 1=1 

where Cfc('^) is the k-ih principal component function at the variable level, aik is 
variable i's loading on the k-th principal component function, rju^t) is the l-th 
principal component function at the replicate level, specific to variable i and 
/3iji is replicate j's loading on the Z-th principal component function specific to 
variable i. Truncating the decompositions at the K-th and L^-th component for 
the variable and replicate level respectively gives 

K Li 

Vijit) = + ^Ck{t)aik + ^Vii{t)l3i3i + £ij{t) 

k=l 1=1 
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Note that the number of principal components retained at the rephcate level, Li , 
is variable specific as indicated by the subscript. The optimal number of principal 
components to be retained at this level will differ between variables depending 
upon the amount of between-replicate heterogeneity displayed. Representing 
the functions fJ.{t), Ck{t) and 'rjii{t) using an appropriately orthogonalised p- 
dimensional B-spline basis (Zhou, Huang and Carroll, 2008) and collecting all 
Nij observations on replicate j for variable i in the vector yij yields 

K Li 

Vij ~ Bij9^ + ^ BijOa^a^ik + ^ BijOp.^Piji + 

k=l 1=1 

where Bij is the Nij x p B-spline basis matrix that has been transformed such 
that {L/g)B'^B — I where B is the basis evaluated on a fine grid of points 
covering the range of the time course, L is the length of this fine grid, g is the 
distance between successive grid points, 0^ is a p-length vector of fitted spline 
coefRcients for the grand mean function Oa^ is a p-length vector of fitted 
spline coefficients for the k-th principal component function at the variable level 
and 6^-1 is a p-length vector of fitted spline coefficients for the ^-th principal 
component function at the replicate level. The transformation of B is required 
to enforce the FPCA orthogonality constraint that 

III' 

and similarly for rjii{t). 

Defining the p x K matrix 0^ — [6ai ■ ■ ■ ^ax] and the p x Li matrix 0^^^ = 
[Opi-^ ■ ■ ■ dpii^.] allows the summations to be simplified using matrix algebra as 

Vij = BijO^, + B.,j&a^ai + B,j&f;^(3.,j + eij (2) 

where = [an ■ ■ ■ ajj^] is the iiT-length vector formed by collecting all of the 
aik terms together and similarly for (3ij. By collecting the observations on all 
replicates j = f , • • • , for variable i, in the vector yi = [yn ■ ■ ■ yim]'^ , we can 
write 

y, ^ B.Of, + Bi&a^cxi + B,©^ A + e, 

where = [Bj^ ■ ■ ■ Bj^^]^ is the N, X p basis matrix formed by stacking each 
Bij matrix on top of one another. There are rii such matrices, each with Nij 
rows and so the matrix Bi has Ni rows where Ni = X]"=i ^ij total number 

of observations on variable i across all replicates. In contrast, the matrix Bi = 
diag(Bii, ■ ■ • , -Sirii) is a block diagonal matrix of dimension Ni x (riip) where 
the blocks correspond to the Bij matrices. Similarly, 0^. = diag(0^;, • • • , 0^.) 
is a block diagonal matrix of dimension {riiLi) x (jiiLi) where each block is 
identical and equal to 0^.. Finally, /3i = [f3ii ■ ■ ■/3i„J'^ and — [en ■ ■ ■ CmJ^. 

Standard practice would be to assume that cti, (3ij and eij are all indepen- 
dently multivariate normally distributed with zero mean and covariance matri- 
ces Da, Dp. and afl respectively. Under these assumptions, y^ is marginally 
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Fig 1. Left: An example of the poor model fits obtained under the Gaussian reduced-rank 
multi-level FPCA model viith real data. This example gene transcript is typical of variables 
across a range of data sets, where, uihile the replicate curves, indicated by the dashed lines, 
look good and closely map the underlying observations, the mean curve, given by the solid 
line, seems to be biologically implausible. Right: This poor fit is fixed when we instead adopt 
our proposed skew-t-normal multi-level reduced-rank FPCA model. 



multivariate normal and the model parameters can be estimated by treating 
the principal component loadings a.; and as missing data and employing the 
EM algorithm. Technical details for this approach can be found in the Supple- 
mentary Material. 

2.2. A skew-t-normal multi-level reduced- rank FPCA model 

We discovered upon fitting the multi-level reduced-rank FPCA model with the 
normal assumption to real data that the fits were biologically implausible, an 
example of which is given on the left hand side of Figure 1. As can be clearly 
seen, the mean curve for this gene transcript, indicated by the solid line, does an 
unusually poor job of describing the underlying observations, being flat over the 
entire range of the time course. However, the rcplicate-level curves, indicated 
by the dashed lines, follow the observations much more closely. Upon further 
investigation, it became clear that this was due to an extreme departure from 
normality for the principal component loadings at the variable level, as can be 
seen in histograms of the initalised loadings for two example data sets given in 
Supplementary Figures 3 and 4, which exhibit heavy tails and varying degrees 
of skewness. To deal with this issue, we propose to instead adopt the assump- 
tion that the variable level loadings follow a skew-t-normal distribution, which 
is flexible enough to account for the heterogenous departures from normality. 
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Formally, we assume that: 



aik ± aik' , k ^ k' aik ± ai'k,i ^ i' 



(3) 



while retaining the assumption that fj^j and are multivariate normally dis- 
tributed, z ~ StN{^,<T'^,X,i') denotes that the random variable z follows a 
skew-t-normal distribution (Gomez, Venegas and Bolfarine, 2007) where ^ is a 
location parameter, is a scale parameter, A is a skewness parameter and v is 
the degrees of freedom controlling the kurtosis. The density of z is given by 



where t^{z; ^, cr^) denotes the Student-t density with v degrees of freedom, loca- 
tion parameter ^ and scale parameter cr^, and $ denotes the normal cumulative 
distribution function. We note here in passing the closely related skew-i distri- 
bution of Azzalini and Capitanio (2003) which is identical in form to (4) except 
that the the normal cumulative distribution function, which controls the skew- 
ness of the density, is replaced by the Student- 1 cumulative distribution function, 
and depends not just on the skewness parameter A but also the degrees of free- 
dom V. As Ho and Lin (2010) point out, evaluating the skew-t-normal density 
is therefore computationally simpler and the decoupling of the skewness from 
the degrees of freedom parameter is more conceptually sound. 

Note that we have chosen not to use a multivariate skew-t-normal density 
for the variable-level loadings as this would require a single degrees of freedom 
parameter to be shared across all components, and we assume that they are 
independent anyway for the purposes of identifiability. Furthermore, we retain 
the original assumption that the replicate-level loadings follow a multivariate 
normal distribution. Judging from Figure 1, which is typical of other variables 
across multiple data sets, the replicate-level curves remain plausible under this 
assumption. 

As in the Gaussian case, we can estimate the parameters under the assump- 
tions given in (3) by treating the principal component loadings as missing data 
and employing the EM algorithm. The maximum likelihood estimators of 0^, 
&a, ®i3i, Dp. and af can be derived analytically as illustrated in the Supple- 
mentary Material. For the parameters of the skew-t-normal distributions, these 
can be estimated using Newton-Raphson (NR) as in Gomez, Venegas and Bol- 
farine (2007) or the EM algorithm as in Ho and Lin (2010). However, the NR 
approach relies on numerical integration and the EM algorithm is known to 
be slow to converge; alternatively, we have found that a simplex optimisation 
(Neldcr and Mead, 1965) works very well in practice. 

The conditional expectations that need to be calculated at the E-step of the 
EM algorithm based on these MLE estimators are summarised in Supplementary 
Table 2. However, expressions for these are challenging to obtain given that 
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under the distributional assumptions, the marginal density of t/,;, as a sum of 
skew-i-normal and normal distributed random variables, follows no known form. 
In these circumstances we can instead turn to the Monte Carlo (MC) EM- 
algorithm (Wei and Tanner, 1990) which replaces the calculation of intractable 
expectations at the E-step with approximations based on averaging random 
samples drawn from the target density. As the target density is itself unknown, 
it is necessary to resort to stochastic simulation algorithms. 

Given that the skew-i-normal distribution admits a convenient hierarchical 
representation, we suggest the use of the Gibbs sampler for drawing samples 
from the joint conditional distribution /(ccj, /3ij|yi). Specifically, following Ho 
and Lin (2010), if t^^ - r(i/a J2, J2) then 



7»fc|r^fc-TjV(0, ^''^^"'° ;(0,oo; 



Tik 



I AT I t I '^Olk^ak 

aiknik,nk ^ ^ Kofc + z — , 



nk + A2 



(5) 



where TN{iJ.,a^\ («:^)) denotes the truncated normal distribution lying within 
the interval (a, 6). It can then be shown (see Supplementary Material) that 



lik\oiik ~ TN{{aik - CaJ — , 1; (0,oo)) 



Tikl^ik ^ r 



(6) 
(7) 



Therefore we introduce additional latent variables, — [th ■ ■ ■ tikY" and 7^ = 
[7ii • • 'liK]^ 1 and the target density for the Monte Carlo E-step becomes the 
joint conditional distribution f{ai,f3i,Ti,'yi\yi) whose form is still unknown. 
However, the conditionals / (a^ , /3, | , , 7^ ) , / (r^ | , a , , /3,, , 7, ) and / (7^ | , , 
follow known distributions that are easy to sample from and hence the Gibbs 
sampler can be used to efficiently generate samples that are approximately dis- 
tributed according to the target full joint conditional density. 

Starting with /(a^, /3i|yi, T;, 7^) note that from (5), conditional on Tik and 

"/ik, Oiik is normally distributed with mean /Iq^ = + ^^"^'^^2'' la^ and variance 
Hence we can write 



OLi 



n,j,^MVN\ 

diag(t>a) ^ 



dmgiv^lBf 
Dp.@lBj 
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where fia = [m^i • • • MaA-]^ and Va = Ki-'-Waxl^^ yy,\T„f^ ^ o-flN,xm + 
B,&c,diag{v^)&lBf + B,@^^D^^&^Bf and D^^ = diag(Dft, • • • , D^J. Us- 
ing a standard result from multivariate statistics (Anderson, 1958), we have 
that f{a.i,f3i\yi,Ti,ji) is therefore also multivariate normal (see Supplemen- 
tary Material for specification of the mean and covariance), which is simple to 
sample from. For f{ji\yi,ai,Pi,Ti) and /(r^lyi, ct^, /3i, 7^), we use (6) and (7) 
respectively. 

2.3. MCEM Algorithm Summary 

Having derived a process for calculating the required maximum likelihood es- 
timators and conditional expectations, we are now in a position to summarise 
the complete MCEM algorithm for estimating the model parameters for the 
skew-t-normal multi-level reduced-rank FPCA model. 

A procedure for initialisation is described in the Supplementary Material. 
After initialisation, the algorithm alternates between approximating the condi- 
tional expectations by running the Gibbs sampler for each variable at the E-step, 
and calculating the maximum likelihood estimators with their sufRcient statis- 
tics replaced by the conditional expectations. Note that the maximum likelihood 
estimators of 9^, and the individual columns of &a and 0^. are interdependent. 
Therefore it is necessary to employ an Expectation Conditional Maximisation 
(ECM) algorithm where each part of the M-step is carried out holding all of the 
other parameters fixed. In other words, first 0^ is estimated holding &a and 
0^. fixed. Next, the first column of 0^ is estimated, holding all other columns 
of 0Q fixed along with 0^. and 0^, and so on. This process can be iterated as 
in James, Hastie and Sugar (2000) or performed once per M-step as in Zhou 
et al. (2010). We have found the latter to work better in practice, in terms of 
numerical stability. 

The maximum likelihood estimates of 0q, and 0^^ are not guaranteed to 
satisfy the constraint that the columns are orthogonal. While it is trivial to 
orthogonalise them by carrying out an eigen-dccomposition, there is a lack of 
consensus as to whether this should be carried out at every iteration as in Zhou, 
Huang and Carroll (2008); Zhou et al. (2010) or once the EM algorithm has 
converged as in James, Hastie and Sugar (2000). In the case of the former we 
sacrifice the monotonicity property of the EM algorithm in order to ensure that 
the estimates remain within the valid parameter space at each iteration. In the 
case of the latter the monotonicity property is retained but intuition suggests 
that the algorithm may converge to parameter values that are far from those 
which maximise the constrained likelihood once the orthogonalisation is carried 
out. In fact, in our experience with the model under the assumption of nor- 
mality, orthogonalising at each iteration results in the algorithm converging to 
a marginally larger likelihood at the expense of many more iterations required 
before convergence, possibly due to some of the iterations decreasing the like- 
lihood. Furthermore, we have found that when the full-rank model is fit, such 
that K and Li are set to the maximum permitted by the choice of spline basis. 



Berk and Montana/A skew-t-normal multi-level reduced-rank FPCA model 



10 



computational instability can occur when orthogonalising at each iteration. An 
alternative to either of these approaches is to carry out the maximisation within 
the constrained parameter space - i.e. all matrices with orthonormal columns - 
as in Peng and Paul (2009). Such methods are difficult to implement due to their 
mathematical complexity and the success of James, Hastie and Sugar (2000); 
Zhou, Huang and Carroll (2008); Zhou et al. (2010) suggest that, practically 
speaking, they are unnecessary. 

The complete algorithm is as follows: 

1. Initialise parameters as described in the Supplementary Material 

2. E-step: For each variable, run the Gibbs sampler for S iterations, sampling 
from f{ai,(3i\yi,T„ji), f{Ti\yi,ai,(3„'y,) and /{■y,\yi,ai,f3i,Ti) in turn 

3. M-step Step 1: Find the maximum likelihood estimators of the parameters 
of the skew-t-normal distributions using a simplex optimisation 

4. M-step Step 2: Update D^.,i = l,---,M 

5. M-step Step 3: Update erf, i = 1, • • • , M 

6. M-step Constrained Maximisation Step 1: Update 0^ while holding &a 
and 0^;,i = l,---,M fixed 

7. M-step Constrained Maximisation Step 2: Update while holding 0^ 
and , i = 1 , • • • , A/ fixed 

8. M-step Constrained Maximisation Step 3: Update 0^^, i = 1, - ■ ■ ,M while 
holding 9f^ and &a fixed 

9. Check for convergence. If not converged, return to 3. 
10. Orthogonalise &a and 0^^, z = 1, • • • , M 

2.4. Model Selection 

Two remaining issues are how to select the number of principal components, 
both at the variable- and replicate-level, and what spline basis to use. For select- 
ing the number of principal components, two main approaches can be considered. 
In the first, the proportion of variance explained by each principal component 
function can be approximated by fitting the full-rank model - such that K and 
Li (for all i) are the maximum permitted by the number of design time points 
(James, Hastie and Sugar, 2000). For the second method, cross-validation is 
used to score each potential value of K and Li (Zhou et al., 2010). Note that 
both of these approaches require a subjective interpretation. For the proportion 
of variance explained method, either an arbitrary cutoff for cumulative variance 
such as 95% or 99% must be used, perhaps aided by an examination of a scree 
plot and a visualisation of the principal component functions in order to as- 
certain their interpretability. On the other hand, as Zhou et al. (2010) discuss, 
when using the cross-validation method on real data the score may simply de- 
crease as the number of principal components increases, which results in always 
selecting the full-rank model going by the score alone. Therefore they suggest 
to instead subjectively trade-off between parsimony and cross-validation score, 
again with the aid of a scree plot. However, in the model of Zhou et al. (2010), 
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the number of principal components at the second-level is not dependent on the 
first-level and so the cross-validation method is much more tractable in their set- 
ting than ours, where the high-dimensional optimisation renders the approach 
impractical. By necessity therefore, we employ the proportion of variance ex- 
plained method. On simulated data we have discovered that this works very well 
at identifying the correct number of principal components at the variable-level, 
but struggles at the replicate-level, most likely due to the much smaller sample 
sizes. We therefore suggest that a (much) more conservative criteria be applied 
at the replicate-level. For instance, we have found that retaining those principal 
components that explain 99% of the variance at the variable-level and 60% of 
the variance at the replicate-level worked well on these data sets. 

For selecting the spline basis we suggest to use natural cubic splines with a 
knot placed at each design time point for several reasons. Firstly, this avoids 
the computational burden of having to select the number of knots. Secondly, 
many of the data sets provided by our collaborators have unequally spaced time 
points, and it therefore makes sense to use each time point as a knot in order 
to adequately capture the temporal dynamics. Thirdly, the reason to counte- 
nance against such an approach would be the danger of overfitting. This is 
accounted for through the use of the reduced-rank model, essentially placing a 
rank-constraint on the covariance matrix of the spline basis coefficients. Con- 
ceptually speaking, this approach to spline basis selection is quite similar to the 
use of smoothing splines, where a knot is placed at each design time point and 
overfitting is avoided through the use of a penalty parameter on the likelihood. 

3. Results 

3.1. Simulation comparison of the Gaussian single- and multi-level 
reduced-rank FPCA models 

We set out to determine whether single-level approaches that assume the vari- 
ables are independent are adequate when it comes to estimating the true un- 
derlying curves, and to quantify the improvement that can be gained through 
the use of a multi-level model using the following simulation setting. 

We generated data under the multi-level reduced-rank FPCA model (2) with 
normality assumed, as the skew-i-normal model is too computationally inten- 
sive, with its reliance on MC methods, for large scale simulation studies. We 
fixed the number of principal components at the variable level to if = 2 with 
a single principal component at the replicate level for each variable, so that 
Li = 1 for all i. We used a B-spline basis with a single knot placed at the centre 
of the time course. The spline coefficients for the grand mean, 0^, were fixed to 
produce the curve that can be seen in Figure 3. 

The spline coefficients for the variable-level principal component functions, 
and were chosen in order to produce the simulated curves given in 
Figure 2. Visualising the grand mean plus and minus each principal component 
function is a typical way of helping to understand their effect (Ramsay and 
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Silverman, 2005). These plots are given in Figure 3. The solid line is the grand 
mean. The points denoted by '+' are the function fi{t) + CQk{t) evaluated on 
a coarse grid of points where C is some constant responsible for scaling the 
principal component function and subjectively chosen in order to aid clarity 
of the visualisation. Similarly, the points denoted by '— ' are the same except 
for ^i{t) — CC,k{t). From these plots it should be clear that the first principal 
component has the effect of rotating the first half of the time course, which 
alters both the level at which the time course begins and the level to which 
it peaks. To a lesser extent the exact time at which the peak occurs is also 
affected. On the other hand, the second principal component function rotates 
the second half of the time course, thereby controlling whether the curve levels 
off, continues to decrease or starts to increase after the peak and the dip. 

For the replicate-level principal components, the same principal component 
function was fixed for all variables. Note that this only serves to simplify the sim- 
ulation scheme, and the multi-level reduced-rank FPCA model will still estimate 
the between-replicate variation for each variable independently. By choosing the 
spline coefficients to produce the profile given on the left hand side of Figure 
4, the effect of the principal component function is to scale the height to which 
the curves peak, as shown on the right hand side of the same Figure. 

The remaining parameters to be determined are the variance components Da , 
Dp. and erf for all i. We set Da = diag(0.3, 0.1) so that 75% of the variance 
at the variable level is explained by the first principal component function. The 
single element of Dp. was set to 0.075 for all i so that the level of between- 
replicate variance is less than that of the between-variable. Similarly, the noise 
was set to cr? = 0.05 for all i. As before, this only serves to simplify the simulation 
scheme and separate estimates will be made for Dp. and trf for each variable. 
The simulation scheme described here is flexible enough to produce a wide range 
of believable curves as evidenced by the examples given in Supplementary Figure 
5. 

In order to compare the single- and multi-level reduced-rank FPCA models 
under a range of experimental designs and to determine whether standard prac- 
tice is appropriate, we generated different data sets by varying the number of 
replicates as either 5, 10 or 20. Roughly speaking, with regards real data sets, 
these correspond to realistic, less frequent and unrealistic numbers of replicates 
respectively. The number of time points per data set was fixed to 5. We fo- 
cused on evaluating the ability to estimate the variable-level curves and for this 
reason we also varied the number of variables between 100, 1,000 and 10,000. 
Although broadly speaking 10, 000 is the only realistic value of the three for 
'omics data sets (unless the variables have been subject to some initial filter- 
ing procedure) , we were interested in exploring the properties of the multi-level 
model and determining how many variables are required to adequately assess 
the between-variable variation. 

We generated 1,000 data sets under each condition. For each data set we 
fit both the single- and multi-level reduced-rank FPCA models under the as- 
sumption of normality. For the purposes of this study we chose not to consider 
the issue of correctly selecting the number of principal component functions at 
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Fig 2. Two simulated variable-level principal component functions used in our simulation 
study to compare the Gaussian single- and multi-level reduced-rank FPCA models. The prin- 
cipal component {unctions were set to explain 75% and 25% o/ the variance in the data 
respectively. 




Fig 3. The effect of the two simulated principal component functions on the simulated grand 
mean from our simulation study to compare the Gaussian single- and multi-level reduced-rank 
FPCA models. The first principal component function rotates the first half of the time course, 
therefore affecting the height to which the functions peak, the level at which the time course 
begins and, to a lesser extent, the exact time at which the peak occurs. Conversely the second 
principal component function rotates the second half of the time course, thereby controlling 
whether the curve levels off by the end of the time course, continues to decrease or starts to 
increase. 
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Simulated replicate-level PC function for Simulated grand mean witli the effect of 

multi-level r«duced-r3nl< fPCA model the replicate-level simulated PC 




Fig 4. Profile of the simulated replicate-level principal component function used in our simu- 
lation study to compare the Gaussian single- and multi-level reduced-rank FPCA models, left, 
and its effect on the grand mean, right. The principal component function has the effect of 
scaling the height to which the curve peaks. 

the variable- and replieate-levels and the number or location of the knots of the 
spline basis and simply input the correct values to each algorithm. We compared 
each variable-level curve by discretising it on a fine grid of points and calculat- 
ing the mean squared error between it and the underlying true curve. Wc then 
averaged this error across all variables and all data sets to give a single measure 
for each model for each condition. 

The complete results of the simulation study are presented in Supplementary 
Tables 3 and 4. In all scenarios the multi-level model substantially improves 
upon the single-level model. For the multi-level model, doubling the number of 
replicates roughly halves the estimation error. Howciver, increasing the number of 
variables has a much less pronounced effect, suggesting the multi-level model still 
performs well when data sets have been pre-filtered. For the single-level model, 
doubling the number of replicates similarly roughly halves the estimation error. 
As expected, increasing the number of variables for this model has no effect on 
the estimation error as each variable is fit in isolation. The most salient insight to 
glean from these results is in the comparison between the two tables. Comparing 
first the case of 5 replicates and 10, 000 variables for the two models, the multi- 
level model offers an improvement in estimation error of approximately a factor 
of ten. Considering the single-level model alone, in order to attain an equivalent 
improvement in estimation error, the number of replicates needs to quadruple 
from 5 to 20. These results suggest that simply using a more sophisticated model 
has the potential to dramatically reduce experimental costs. 
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3.2. Real data analysis 

We fit the skew-t-normal multi-level reduced-rank FPCA model to a genomics 
data set study the genetic response to infection by BCG, the vaccine for tu- 
berculosis, in 9 human volunteers. We determined that the MCEM algorithm 
had converged after 1050 iterations by examining parameter trace plots. On the 
right hand side of Figure 1 we show the fit obtained under the new model to 
the CCL20 transcript. As can be clearly seen, the mean curve is much more 
reasonable, closely following the underlying observations. 

The variable-level principal component functions are given in Figure 5. The 
first principal component - which explains a huge proportion of the variance, 
99.998% - is responsible for vertically shifting a given transcript. In this respect, 
it is the most uninteresting of the principal component functions as it does not 
control the shape of the curves. We stress that it is not suprising that such a 
large proportion of the variance is explained by vertical shifts given that very 
few of the tens of thousands of variables in an 'omics data set will actually 
exhibit significant changes over time. It is therefore still instructive to consider 
the other principal component functions even if the proportion of variance they 
explain may lead to them being overlooked in more traditional PCA application 
areas. 

The second principal component function accounts for those transcripts which 
rapidly spike before plateauing at some elevated level of expression or, con- 
versely, are rapidly repressed. The CCL20 transcript is a example of this. The 
third principal component function describes those transcripts which are in- 
duced or repressed more slowly, tailing off at around 7 or 8 hours. In Supple- 
mentary Figure 6 we give an example of a transcript that exhibits this profile 
and was found to have the highest positive loading on the third principal com- 
ponent function. More complex profiles can be explained by a combination of 
these two components. The fourth principal component function is less inter- 
pretable but appears to be mainly controlling for variation at the end of the 
time course. The fifth principal component function is even less interpretable 
and should probably be discarded when the data is fit for a second time under 
the reduced-rank model. 

4. Discussion 

To date, multi-level functional models such as the one we have presented here 
have yet to be applied in the 'omics fields that we focus on. In other domains, 
Zhou ct al. (2010) independently developed a similar multi-level reduced-rank 
FPCA model under the normality assumption. Their model differs from ours 
in a few key respects. Firstly, the second-level prinicpal component loadings (in 
our model, this would be the replicate-level) are not specific to the first-level. In 
other words, in our context, their model would assume that the within- variable 
variation is identical for all variables. As we motivated with Supplementary 
Figures 1 and 2, real data does not support this assumption. Secondly, they 
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Principal Component 1 Principal Component 4 

Variance explained - 99.998115 % Variance explained = 0.000119 



Hours Hours 

Principal Component 2 Principal Component 5 
Variance explained = 0.001051 % Variance explained - 0.000001 % 



7- 



Hours Hours 

Principal Component 3 
Variance explained = 0.000715 % 



i 6 ?4 

Hours 

Fig 5. Variable-level principal component junctions obtained from fitting the skew-t-normal 
multi-level reduced-rank fPCA model to our example real data set. The solid line is the grand 
mean across all transcripts. The points '+' indicate the effect of adding each principal com- 
ponent function 'multiplied bij a constant C to the grand mean, where C is simply subjectively 
chosen to aid the visualisation. Similarly the points '- ' indicate the effect of subtracting each 
principal component function multiplied by C from the grand mean. See discussion in the 
main text. 
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allow for the replicate-level loadings to be correlated within a given variable. In 
this respect their model is an extension of ours; however, this is motivated by 
spatial dependencies in their case study that do not exist in our example data 
sets. Thirdly, they use a penalised spline representation for the principal com- 
ponent functions. In principle this should allow for a more data-driven approach 
to smoothing than our choice of natural splines with the maximum number of 
knots. However, we note that their data set has many more 'time' points (in fact, 
the dependent variable is distance), than our case studies (20 — 40) and therefore 
smoothing is more likely to be an issue if the function has been oversampled. 
Furthermore, they select only three distinct smoothing parameters, one for the 
grand mean, one for all variable-level principal component functions, and one for 
all replicate-level principal component functions. Although it is clear to under- 
stand the computational burden that motivates such a restriction, it seems to do 
away with the advantage that motivates penalised estimation in the first place, 
which is to account for principal component functions of varying smoothness. 
Fourthly and finally, they fit a single error variance for all variables. It is well- 
known that noise in microarray experiments is transcript-dependent (Tusher, 
Tibshirani and Chu, 2001) and so such a noise model would be inadequate for 
our purposes. Finally, the model is demonstrated on a data set with far fewer 
variables than our case studies (3 comapred with on the order of 10, 000) which 
may explain why they did not experience the same problems with assuming 
normality that we did. 

Although we have demonstrated that the MCEM algroithm for the skew- 
i-normal model yields biologically interpretable results on real data, the com- 
putational burden is severe. We suggest several lines of attack for developing 
a practical algorithm that can fit the model in a more reasonable time frame. 
Firstly, a different sampling scheme could be employed in the MCEM algorithm, 
specifically importance sampling using a multivariate t proposal with a small 
degrees of freedom parameter. Moving away from the EM-algorithm, there may 
be the potential for an (approximate) analytical solution. In particular, the work 
of Forchini (2008) on deriving the form of the density of the sum of a normal and 
a Student-t distributed random variable may provide some guidance. Deriving 
the density of the sum of a normal and a skew- normal random variable would 
be an important first step. However, the result of Forchini (2008) relies on trun- 
cating a Taylor series expansion and so careful consideration should be given as 
to the accuracy of such an approximation. Ultimately, however, we believe that 
the most fruitful line of further investigation is a fully Bayesian approach, taken 
by imposing prior distributions on the model parameters. Parameter estimation 
could then be carried out using a modified version of our Gibbs sampler. Jara, 
Quintana and Martin (2008) have already demonstrated that such a Bayesian 
approach works well for single- level mixed-effects models with either skew-t or 
skew-normal random-effects. 
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Supplementary Material for "A Skew-t-Normal Multi-Level 
Reduced- Rank Functional PC A Model with Applications to 
Replicated 'Omics Time Series Data Sets" 



1 The Gaussian multi-level reduced-rank FPCA model 

For the Gaussian case we take the following distributional assumptions: 



MVN{0,a^I, 



NiiXNi 



(1) 



where the K x K and Li x Lj matrices Da and Dfj^ are both assumed to be diagonal otherwise they would 
be confounded with ©„ and ©/j.. Furthermore, we assume that a^, /3jj and Cij are all independent of each 
other for all i and j. To enforce the orthogonality constraint on the functions Ck(t) and rjuit), in addition to 
transforming the spline basis, we also impose that 0^0^ = Irxk and 0^.0^. = iLixLf 
Under the distributional assumptions given above, y, is marginally distributed as 



yi^MVN{Bie^,Vi) 



where 



As in James et al. (2000), the model parameters 



(2) 



where M is the total number of variables in the data set, can be estimated by treating the principal component 
loadings as missing data and employing the EM algorithm. Under the distributional assumptions given above, 
the complete data log-likelihood is 
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where C is an additive constant. We will now proceed to derive the maximum likelihood estimators of the 
model parameters, followed by the required conditional expectations for the EM algorithm. 
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1.1 MLE of 

To derive the maximum likelihood estimator of 0^ first ignore all irrelevant terms in (3) and for succinctness 
write Ui — BiO^ — Bi&aCti — Bi&^^^i = y* — Bidfj,. Then the partial derivative with respect to 6^ yields 
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Equating to zero and solving for 9^ gives 



= ^ar2 [-2Bjyl+2BjB,d^] 



M 



M 



Y,{BlB,)e, = Y,Bjv* 

8=1 i=l 
M 

r, = {Y,BjB,r'Y.Bly- 

i=l 1=1 
M M 



M 



i=l 



(4) 



1.2 MLE of e 



oik 



For the maximum likelihood estimator of 9a^, which recall is the fc-th column of proceed in exactly the 

same way as for 0^. For clarity first redefine vl = Vi — Bi9ij, — Bi J2k'jtk [^a^/'^ik'] — Then, taking 

the partial derivative of the relevant terms of (3) with respect to gives 
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Equating to zero and solving for 0^^. gives 
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and so 
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The derivation for the maximum likelihood estimator of 9p^^ is similar to that for 9^^ except this time we only 
need to consider observations on variable i. Hence we define y*^ = yij—Bij9^—Bij&ai^Bij ['^Puil^iji'] 
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and take the partial derivative of the relevant terms in (3) with respect to 0^,, which yields 



Equating to zero and solving for 0^., gives 
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and so 
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1.4 MLE of Do, 

As Da is diagonal we consider estimating each diagonal element separately. The relevant terms in (3) are 

1 v^M 



X^j^i [log I Da I + cxfD^ ^cxi] . As Da is diagonal we can write 
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where denotes the fc'-th diagonal element of D^- Considering the case of estimating the fc-th 

diagonal element of D^, we first separate out the relevant terms 
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Then we take the partial derivative with respect to [Da\kk 
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Equating to zero and solving for [Da\kk gives 
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1.5 MLE of D^^ 

Following exactly the same procedure to dervive the maximum likelihood estimator of D^. gives 
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1.6 MLE of 

To derive the maximum likelihood estimator of af first make the substitution = t/i — Bi9^ — Bi&aC^i — 
Bi&p.^i for clarity. Then the relevant terms of (3) are — | [Ni log af + a~'^efeij. Taking the partial 
derivative with respect to gives 
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Equating to zero and solving for ct? gives 



1.7 Conditional Expectations 
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Supplementary Table 2 summarises the requried conditional expectations for the EM algorithm, based on the 
sufficient statistics of the maximum likelihood estimators derived above. In order to derive these conditional 
expectations we first write 
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Supplementary Table 1: The required conditional expectations for the EM algorithm for the Gaussian multi- 
level reduced-rank fPCA model 
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Supplementary Table 2: The required conditional expectations for the EM algorithm for the skew-t- normal 
multi-level reduced-rank FPCA model 



where recall that Vi is given in (2). Using the standard result given in Anderson (1958) we can then write 
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Hence we immediately have 

E[ai\yi] = a,= D^@lBjVr\yi - BiO^) 
Emyi] =0i = D;,&^Bfvr\yi - B^O^) 
and the remaining results follow from the definition of covariance 

E[aiaJ\yi] = o^f = SiSf + L>„ - D^@lBjVr^Bi&^D^ 
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For the case of E[ejei\yi\ first note that E[ejei\yi\ — tr{E[eieJ\yi]). Then, letting = y^ — BiO^ — 
BiQaOi — Bi@/3./3i and from the definition of covariance we have 
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2 Deriving the posterior distributions of jik and Tik 

Recall that the skew-t-normal density is given by 



(16) 



The joint density /(ajfc, 7ifc, m) is given by 
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We first integrate 'jik out of the joint density f{(Xik,Tik,'yik) to obtain 
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Making the substitution x = ^ik — [ctik — Cat ) , the integration becomes 
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where $(•) denotes the cdf of the standard normal distribution. Hence we have 
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Dividing (17) by (18) gives f{-fik\aik,Tik) as 
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As Tik does not appear anywhere on the right hand side, it implies that, conditional on aik, Jik and Tik are 
independent. Hence 

f{'yik\aik) = —? ^(^'fc ^ ^) exp i ~ ( 7ifc - {aik - J — ) i (19) 

From (19) it follows that jik\aik ~ TN{{aik — ^at)-^, 1; (0,oo)) and so 

i^[7ifeM = (aifc-^aJ^ + ^ f4 (20) 

'^"^ $((a,,-^„J^) 

For f{Tik\aik), we divide (18) by (16) yielding 

''^"r((.<.,+l)/2) 1, 2a^^ ; 

exp I 1 (21) 

and so 

Tik\<^ik ~ J- 
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4 EM algorithm initialisation 

To obtain initial estimates of the parameters, the following procedure is adopted, with similar approaches 
used elsewhere in the literature (Peng and Paul 2009; Zhou et al. 2010). 

First a spline is fit to the entire data, ignoring variable and replicate labels, using least squares in order 
to obtain initial values for 6^. Next, the grand mean is subtracted from all observations and each variable 
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is fit independently with a spline using least squares. With M variables in the data set and a p-dimensional 
spline basis, this results in an M x p matrix of spline basis coefficients. A standard PCA is performed on 
this matrix yielding p eigenvectors each of length p. The first K eigenvectors are taken to be the initial 
©a matrix, and skew-t-normal distributions are fit to the loadings in order to initialise ^q^. , ct^^ , and 
Ooik ■ Next the variable means are subtracted from the observations and each replicate for each variable is fit 
independently with a spline using least squares. Note that it will be necessary to instead perform a ridge 
regression in those cases where the replicate has not been observed at all time points as the matrix BfjBij 
will not be of full rank. In fact, we suggest to perform a ridge regression in all instances, even when the 
matrix Bj-Bij is of full rank, as this imposes some smoothness and avoids overfitting the data which may 

lead to unrealistically small initial values for cr?. As a result of this procedure, an x p matrix of spline 
basis coefficients is obtained for each variable. As before, a standard PCA is performed on this matrix and 
the ffist Li eigenvectors retained as the initial values for 0^. . The corresponding eigenvalues are used for the 
initial values for the diagonal element of D^^ . After subtracting these replicate curves from the observations 
we arc left with initial estimates and the sample variance of these residuals can be used as initial values 
for cr? for each variable. 
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IFNG 
Interferon, gamma 




Supplementary Figure 1: Raw data for expression levels of interferon-gamma measured in blood samples 
from nine human patients to which the BCG vaccine for tuberculosis was added. Interferon-gamma is well- 
known to play an important role in the immune response to tuberculosis infection. Note the heterogeneous 
response with patients two, three, seven and eight exhibiting flat profiles, while the other patients end the 
time course with elevated levels of gene expression. Compare this to the much more homogeneous profiles 
for tumour necrosis factor-alpha given in Supplementary Figure 2 
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TNF 

Tumor necrosis factor (TNF superfamily, member 2) 




Supplementary Figure 2: Raw data for expression levels of tumour necrosis factor-alpha measured in blood 
samples from nine human patients to which the BCG vaccine for tuberculosis was added. Tumour necrosis 
factor-alpha is well-known to be associated with tuberculosis infection. Note the subtle differences between 
the profiles such as the expression level at which the time course begins, the level to which they peak, and 
whether they plateau, continue to rise or start to fall by the end of the time course. However, compared with 
the data given in Supplementary Figure 1 for interferon-gamma, these profiles are much more homogeneous 
across the different patients. 
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Supplementary Figure 3: Initialised variable- level loadings for a genomics data set studying the genetic 
response to BCG infection. Note the departure from normality in all instances with many outliers and 
varying levels of skewness. 
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Supplementary Figure 4: Initialised variable-level loadings from a metabolomics toxicology study. Note the 
departure from normality in all instances with many outliers and varying levels of skewness. The extreme 
skew for the loadings on the first principal component, such that the distribution is essentially a truncated-i 
distribution, is a result of the fact that there are no negative observations in this data set. 
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Supplementary Figure 5: Example simulated data for two variables produced under the simulation setting 
for the Gaussian multi-level reduced-rank fPCA model. The solid black lines correspond to the variable 
mean curve. The coloured dashed lines are the individual replicate curves. For clarity only three replicates 
are shown. Final observations including simulated noise are shown as circles, and are also colour-coded 
for replicate. Note how the combination of the two variable-level and the single replicate-level principal 
component functions described above result in a wide range of replicate curves. 
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SDC2 




Hours 



Supplementary Figure 6: Fit to an example transcript from the BCG genomics data set obtained under the 
skew-i-normal multi-level reduced-rank FPCA model. This transcript, corresponding to gene SDC2, was 
found to have the highest positive loading on the third principal component function given in Figure 5 in 
the main text. This principal component function account for those probes which are induced or repressed 
slowly until around 8 hours before levelling off. Unsurprisingly, SDC2 is a prime example of this. 
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Number of replicates 
5 10 20 

100 0.000406 (0.000424) 0.000195 (0.000205) 0.0000953 (0.0000999) 
Number of variables 1000 0.000348 (0.000388) 0.000167 (0.000186) 0.0000813 (0.0000903) 

10000 0.000342 (0.000387) 0.000165 (0.000185) 0.0000799 (0.0000897) 

Supplementary Table 3: Estimation error for variable-level curves using the Gaussian multi-level reduced-rank fPCA model. Values shown are 
the mean (standard deviation) MSE across all variables in 1000 simulated data sets 



Number of variables 



Number of replicates 
5 10 20 

100 0.0022G (0.00224) 0.00113 (0.00112) 0.000562 (0.000559) 
1000 0.00225 (0.00223) 0.00113 (0.00112) 0.000564 (0.000561) 
10000 0.00226 (0.00224) 0.00113 (0.00112) 0.000564 (0.000560) 



Supplementary Table 4: Estimation error for variable-level curves using the Gaussian single-level reduced-rank fPCA model. Values shown are 
the mean (standard deviation) MSE across all variables in 1000 simulated data sets 
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